Languages under the influence: Building a database of Uralic languages

نویسندگان

  • Eszter Simon
  • Nikolett Mus
چکیده

For most of the Uralic languages, there is a lack of systematically collected, consequently transcribed and morphologically annotated text corpora. This paper sums up the steps, the preliminary results and the future directions of building a linguistic corpus of some Uralic languages, namely Tundra Nenets, Udmurt, Synya Khanty, and Surgut Khanty. The experiences of building a corpus containing both old and modern, and written and oral data samples are discussed. Principles concerning data collection strategies of languages with different level of vitality and endangerment are discussed. Methodologies and challenges of data processing, and the levels of linguistic annotation are also described in detail.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The developments, uses, and functions of preverbal particles in Hungarian and other Uralic languages

Within the Uralic language family, preverbal particles generally only occur within the Ugric branch of the Finno-Ugric languages, a fact known for some time (cf. Zsirai, 1933). Most Uralic scholars (who assume the existence of proto-Uralic) assume that preverbal particles are not a Uralic feature, that is, the existence of these particles in a handful of Uralic languages is due to innovations i...

متن کامل

Matti MiestaMo (Helsinki) POLAR INTERROGATIVES IN URALIC LANGUAGES A TYPOLOGICAL PERSPECTIVE

The paper surveys the domain of polar interrogation in the Uralic language family in a typological perspective. An overview of the ways in which polar interrogation is marked in the world’s languages is presented and the encoding of the domain in Uralic languages is examined against this background. All the major types of polar interrogative marking are found in the family. Polar interrogatives...

متن کامل

Computational Morphologies for Small Uralic Languages

This article presents a set of morphological tools for small Uralic languages. Various Hungarian research groups specialized in Finno-Ugric linguistics and a Hungarian language technology company (MorphoLogic) have initiated a project with the goal of producing annotated electronic corpora for small Uralic languages. The languages described include Mordvin, Udmurt (Votyak), Komi (Zyryan), Mansi...

متن کامل

The Influence of Sociological Factors on Usage of Mazandarani Language among the Youth

In this research, it has been attempted to determine the social role of two languages, Persian and Mazandarani languages ​​in Qaemshahr and their influence on young people on the use of these linguistic species. In societies with more than one language, we see the collision of languages ​​in various forms. In other words, some consequences of this collision of language cause the loss of the imp...

متن کامل

Morphological Tools for Six Small Uralic Languages

This article presents a set of morphological tools for six small endangered minority languages belonging to the Uralic language family, Udmurt, Komi, Eastern Mari, Northern Mansi, Tundra Nenets and Nganasan. Following an introduction to the languages, the two sets of tools used in the project (MorphoLogic’s Humor tools and the Xerox Finite State Tool) are described and compared. The article is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017